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AUDIENCE SURVEY SYSTEM AND METHOD 

Cross-Reference To Related Application 
The present application is a continuation-in-part of U.S. Patent Application Serial 
No. 09/441,539, filed November 16, 1999, which claimed the benefit of Provisional 
5 Patent Application Serial No. 60/140,190, filed June 18, 1999. 

Field Of The Invention 
The invention relates to a system and method for automatically generating a report 
of estimated audience size and characteristics, and providing the report to subscribers of 
the report. In one embodiment, the report corresponds to a ratings report that indicates 
10 audience size associated with, for example, radio and/or television stations in a particular 
electronic media market. 

Discussion Of The Prior Art 
Radio and television surveys have been conducted for many years to determine the 
relative popularity of programs and broadcast stations. This information is necessary for a 

15 number of reasons including the determination of advertising price structure and deciding 
if certain programs should be continued or canceled. One of the most common methods 
for performing these surveys is for survey members to manually record the radio and 
television stations that they listen to and watch at various times of day. The maintaining of 
these manual logs is cumbersome and inaccurate. Additionally, transferring the 

20 information in the logs to an automated system represents an additional time consuming 
process. 

Various systems have been developed that provide a degree of automation to 
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conducting these surveys. In a typical semiautomatic survey system an electronic device 
records which television station is being viewed in a survey member's home. The survey 
member may optionally enter the number of people who are viewing the program. These 
data are electronically transferred to a central location where survey statistics are 
5 compiled. 

Automatic survey systems have been devised that substantially improve efficiency. 
Many of the methods used involve the injection of a coded identification signal within the 
audio or video. There are several problems with these so-called active identification 
systems. First, each broadcaster must cooperate with the survey organization by installing 

10 the coding equipment in its broadcast facility. This represents an additional expense and 
complication to the broadcaster that may not be acceptable. The use of identification 
codes can also result in audio or video artifacts that are objectionable to the audience. An 
active encoding system is described by Best et al. in U.S. Patent 4,876,617. Best employs 
two notch filters to remove narrow frequency bands from the audio signal. A frequency 

15 shift keyed signal is then injected into these notches to carry the identification code. Codes 
are repeatedly inserted into the audio when there is sufficient signal energy to mask the 
codes. However, when the injection level of the code is sufficient to assure reliable 
decoding it is perceptible to listeners. Conversely, when the code injection level is reduced 
to become imperceptible decoding reliability suffers. Best has improved on this invention 

20 as taught in U.S. Patent 5,1 13,437. This system uses several sets of code frequencies and 
switches among them in a pseudo-random maimer. This reduces the audibility of the 
codes. 

Fardeau et al. describe a different type of system in U.S. Patent 5,574,962 and U.S. 
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Patent 5,581,800 where the energy in one or more frequency bands is modulated in a 
predetermined manner to create a coded message. A small body-worn (or carried) device 
receives the encoded audio from a microphone and recovers the embedded code. After 
decoding, the identification code is stored for later transfer to a central computer. The 
5 problem remains that all broadcast stations to be detected by the system must be 
persuaded to install code generation and insertion equipment in their audio feeds. 

Broughton et al. describe a video signaling method in U.S. Patent 4,807,031 that 
encodes a message by modulating the relative luminance of the two fields.comprising a 
video frame. While intended for use in interactive television, this method can also be used 
10 to encode a channel identification code. An obvious limitation is that this method cannot 
be used for radio broadcasts. Additionally, the television broadcast equipment must be 
altered to include the identification code insertion. 

Passive signal recognition techniques have been developed for the identification of 
prerecorded audio and video sources. These systems use the features of the signal itself as 

1 5 the identification key. The unknown signal is then compared with a library of similarly 
derived features using a pattern recognition procedure. One of the earliest works in this 
area is presented by Moon et al. in U.S. Patent 3,919,479. Moon teaches that correlation 
functions can be used to identify audio segments by matching them with replicas stored in 
a database. Moon also describes the method of extracting sub-audio envelope features. 

20 These envelope signals are more robust than the audio itself, but Moon's approach still 
suffers from sensitivity to distortion and speed errors. 

A multiple stage pattern recognition system is described by Kenyon et al. in U.S. 
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Patent 4,843,562. This method uses low-bandwidth features of the audio signal to quickly 
determine which patterns can be immediately rejected. Those that remain are subjected to 
a high-resolution correlation with time warping to compensate for speed errors. This 
system is intended for use with a large number of candidate patterns. The algorithms used 
5 are too complex to be used in a portable survey system. 

Another representative passive signal recognition system and method is disclosed 
by Lamb et al. in U.S. Patent 5,437,050. Lamb performs a spectrum analysis based on the 
semitones of the musical scale and extracts a sequence of measurements forming a 
spectrogram. Cells within this spectrogram are determined to be active or inactive 
. 10 . depending on the relative power in each cell. The spectrogram is then compared to a set of 
reference patterns using a logical procedure to determine the identity of the unknown 
input. This technique is sensitive to speed variation and even small amounts of distortion. 

Kiewit et al. have devised a system specifically for the purpose of conducting 
automatic audience surveys as disclosed in U.S. Patent 4,697,209. This system uses 

15 trigger events such as scene changes or blank video frames to determine when features of 
the signal should be collected. When a trigger event is detected, features of the video 
waveform are extracted and stored along with the time of occurrence in a local memory. 
These captured video features are periodically transmitted to a central site for comparison 
with a set of reference video features from all of the possible television signals. The 

20 obvious shortcoming of this system is that it cannot be used to conduct audience surveys 
of radio broadcasts. 

The present invention combines certain aspects of several of the above inventions, 
4 
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but in a unique and novel manner to define a system and method that is suited to 
conducting audience surveys of both radio and television broadcasts. 

Summary Of The Invention 
5 The present invention is directed to a system and method for generating a report of 

estimated audience size and characteristics and providing the report to subscribers of the 
report. A plurality of portable monitoring units are provided to users that are members of 
an audience panel. Each portable monitoring unit records mformation representative of 
prograinming content of free field audio signals received by the portable monitoring unit. 

10 Information representative of audio signals broadcast from a plurality of program sources 
is recorded at a central broadcast collection facility. For each audio signal, the information 
recorded by the central broadcast collection facility includes information representing 
programming content of the audio signal. Program source information is then compiled by 
identifying the program source selected by each user of a portable monitoring unit during 

1 5 each of a plurality of different time periods in accordance with a match between the 

mformation recorded by the portable monitoring units and the information recorded by the 
central broadcast collection facility. A report of estimates of audience size and 
characteristics is generated based on the observed behavior of the audience panel and in 
accordance with the compiled source information. The report of estimates of audience size 

20 and characteristics is provided to subscribers in exchange for payment of a subscription 
fee. 

In a preferred embodiment of the present invention, audience surveys are 
accomplished using a number of body-worn portable monitoring units. These units 

5 
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periodically sample the acoustic environment of each survey member using a microphone. 
The audio signal is digitized and features of the audio are extracted and compressed to 
reduce the amount of storage required. The compressed audio features are then marked 
with the time of acquisition and stored in a local memory. A central computer extracts 
5 features from the audio of radio and television broadcast stations using direct connection 
to a group of receivers. The audio is digitized and features are extracted in the same 
manner as for the portable monitoring units. However, the features are extracted 
continuously for all broadcast sources in a market. The feature streams are compressed, 
time-marked and stored on the central computer disk drives. 

10 In the preferred embodiment, when the portable monitoring units assigned to 

survey members are not being worn (or carried), they are stored in docking stations that 
recharge the batteries and also provide modems and telephone access. On a daily basis, or 
every several days, the central computer interrogates the docked portable monitoring unit 
using the modem and transfers the stored feature packets to the central computer for 

15 analysis. This is done late at night or early in the morning when the portable monitoring 
unit is not in use and the phone line is available. In addition to transferring the feature 
packets, the current time marker is transferred from the portable monitoring unit to the 
central computer. By comparing the current time marker with the time marker transferred 
during the last interrogation the central computer can determine the apparent elapsed time 

20 as seen by the portable monitoring unit. The central computer then makes a similar 

calculation based on the absolute time of interrogation and the previous interrogation time. ■ 
The central computer can then perform the necessary interpolations and time translations 
to synchronize the feature data packets received from the portable monitoring unit with 
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feature data stored in the central computer. 

By comparing the audio feature data collected by a portable monitoring unit with 
the broadcast audio features collected at the central computer site, the system can 
determine which broadcast station the survey member was listening to at a particular time. 
5 This is preferably accomplished by computing cross-correlation functions for each of 

three audio frequency bands between the unknown feature packet and features collected at 
the same time by the central computer for many different broadcast stations. The fast 
correlation method based on the FFT algorithm is used to produce a set of normalized 
correlation values sparrning a time window of approximately six seconds. This is 

1 0 sufficient to cover residual time synchronization errors between the portable monitoring 
unit and the central computer. The correlation functions for the three frequency bands will 
each have a value of +1.0 for a perfect match, 0.0 for no correlation, and -1 .0 for an exact 
opposite. These three correlation functions are combined to form a figure of merit that is a 
three dimensional Euclidean distance from a perfect match. This distance is calculated as 

15 the square root of the sum of the squares of the individual distances, where the individual 
distance is equal to (1.0 - correlation value). In this representation, a perfect match has a 
distance of zero from the reference pattern. In an improved embodiment of the invention 
the contributions of each of the features is weighted according to the relative amplitudes 
of the feature waveforms stored in the central computer database. This has the effect of 

20 assigning more weight to features that are expected to have a higher signal-to-noise ratio. 

The minimum value of the resulting distance is then found for each of the 
candidate patterns collected from the broadcast stations. This represents the best match for 
each of the broadcast stations. The minimum of these is then selected as the broadcast 

7 
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source that best matches the unknown feature packet from the portable monitoring unit. If 
this value is less than a predetermined threshold, the feature packet is assumed to be the 
same as the feature data from the corresponding broadcast station. The system then makes 
the assertion that the survey member was listening to that radio or television station at that 
5 particular time. 

By collecting and processing these feature packets from many survey members in 
the context of many potential broadcast sources, comprehensive audience surveys can be 
conducted. Further, this can be done faster and more accurately than was possible using 
previous methods. 

10 Description Of The Drawings 

The features, objects, and advantages of the present invention will become more 
apparent from the detailed description set forth below when taken in conjunction with the 
following drawings: 

Figure 1 illustrates the functional components of the invention and how they 
1 5 interact to function as an audience measurement system. Audience survey panel members 
wear portable monitor units that collect samples of audio in their environment. This 
includes audio signals from broadcast radio and television receivers. The radio and 
television broadcast signals in a survey market are also received by a set of receivers 
connected to a central computer. Audio features from all of the receivers are recorded in a 
20 database on the central computer. When not in use, portable monitor units are placed in 
docking stations where they can be interrogated by the central computer via dialup 
modems. Audio feature samples transferred from the portable monitor units are then 

8 
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matched with audio features of multiple broadcast stations stored in the database. This 
allows the system to determine which radio and television programs are being viewed or 
heard by each panel member. 

Figure 2 is a block diagram of a portable monitor unit. The portable monitoring 
5 unit contains a microphone for gathering audio. This audio signal is amplified and 
lowpass filtered to restrict frequencies to a little over 3 kHz. The filtered signal is then 
digitized using an analog to digital converter. Waveform samples are then transferred to a 
digital signal processor. A low power timer operating from a separate hthium battery 
activates the digital signal processor at intervals of approximately one minute. It will be 

10 understood by those skilled in the art that the digital processor can collect the samples at 
any period interval, and that use of a one-minute period is a matter of design choice and 
should not be considered as limiting of the scope of the invention. The digital signal 
processor then reads samples from the analog to digital converter and extracts features 
from the audio waveform. The audio features are then compressed and stored in a non- 

1 5 volatile memory. Compressed feature packets with time tags are later transferred through 
a docking station to the central computer. A rechargeable battery is also included. 

Figure 3 shows the three frequency bands that are used for feature extraction in a 
particularly preferred embodiment of the present invention. The energy in each of these 
three frequency bands is sampled approximately ten times per second to produce feature 
20 waveforms. 

Figure 4 illustrates the major components of the central computer that 
continuously captures broadcast audio from multiple receivers and matches feature 



D 021 3396A2_I_> 



WO 02/13396 



PCT/US01/24338 



packets from portable units with possible broadcast sources. A set of audio amplifiers and 
lowpass antialias filters provide appropriate gain and restrict the audio frequencies to a 
little over 3 kHz. A channel multiplexer rapidly scans the filter outputs and transfers the 
waveforms sequentially to an analog to digital converter producing a multiplexed digital 
5 time series. A digital signal processor performs a spectrum analysis and produces energy 
measurements of each of three frequency bands from each of the input channels. These 
feature samples are then transferred to a host computer and stored for later comparison. 
The host computer contains a bank of modems that are used to interrogate the portable 
monitor units while they are docked. Feature data packets are transferred from the 
10 portable units during this interrogation. One or more digital signal processors are 

connected to the host computer to perform the feature pattern recognition process that 
identifies which broadcast channel, if any, matches the unknown feature packets from the 
portable monitoring units. 

Figure 5 is a block diagram of the docking station for the portable monitor unit. 

15 The docking station contains four components. The first component is a data interface that 
connects to the portable unit This interface may include an electrical connection or an 
infrared link. The data interface connects to a modem that allows telephone 
communication and transfer of data. A battery charger in the docking station is used to 
recharge the battery in the portable unit. A modular power supply is included to provide 

20 power to the other components. 

Figure 6 illustrates" an expanded survey system that is intended to operate-in 
multiple cities or markets. A wide area network connects a group of remotely located 
signal collection systems with a central site. Each of the signal collection systems captures 
10 
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broadcast audio in its region and stores features. It also interrogates the portable 
monitoring units and gathers the stored feature packets. Data packets from the remote sites 
are transferred to the central site for processing. 

Figure 7 is a flow chart of the audio signal acquisition strategy for the portable 
5 monitoring units. The portable monitoring units activate periodically and compute features 
of the audio in the environment. If there is sufficient audio power the features are 
compressed and stored. 

Figure 8 is a flow chart of procedures used to collect and manage audio features 
received at central collection sites. This includes the three separate processes of audio 
10 collection, feature extraction, and deletion of old feature data. 

Figure 9 is a flow chart of the packet identification procedure. Packets are first 
synchronized with the database. Corresponding data blocks from broadcast audio sources 
are then matched to find the minimum weighted Euclidean distance to the unknown 
packet. If this distance is less than a threshold, the unknown packet is identified as 
1 5 matching the broadcast. 

Figure 10 is a flow chart of the pattern matching procedure. Unknown feature 
packets are first zero padded to double their length and then correlated with double length 
feature segments taken from the reference features on the central computer. The weighted 
Euclidean distance is then computed from the correlation values and the relative 
20 amplitudes of the features stored in the reference patterns. 

Figure 1 1 illustrates the process of averaging successive weighted distances to 
improve the signal-to-noise ratio and reduce the false detection rate. This is an exponential 
11 
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process where old data have a smaller effect than new data. 

Figure 12 illustrates a flow diagram showing how program source information is 
compiled. The program source selected by each user of a portable monitoring unit during 
each of a plurality of different time periods is identified in accordance with a match 
5 between the information recorded by the portable monitoring units and the information 
recorded by the central broadcast collection facility. A report of estimates of audience size 
and characteristics is then generated based in accordance with the compiled source 
information. 

Figure 13 illustrates an exemplary report estimating audience size and 
10 characteristics generated in accordance with the present invention. 

Detailed Description Of The Preferred Embodiment 
The audience measurement system according to the invention consists of a 
potentially large number of body-worn portable collection units 4 and several central 
computers 7 located in various markets. The portable monitoring units 4 periodically 
15 sample the audio environment and store features representing the structure of the audio 
presented to the wearer of the device. The central computers continuously capture and 
store audio features from all available broadcast sources 1 through direct connections to 
radio and television receivers 6. The central computers 7 periodically interrogate the 
portable units 4 while they are idle in docking stations 10 at night via telephone 
20 connections and modems 9. The sampled audio feature packets are then transferred to the 
- central computers for comparison with the broadcast sources. When a match is found, the 
presumption is that the wearer of the portable unit was listening to the corresponding 
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broadcast station. The resulting identification statistics are analyzed in accordance with 
the methodology shown in Figure 12 to construct reports of the listening habits of the 
users. An example of one such report is shown in Figure 13. 

In typical operation, the portable monitoring units 4 compress the audio feature 
5 samples to 200 bytes per sample. Sampling at intervals of one minute, the storage 

requirements are 200 bytes per minute or 12 kilobytes per hour. During quiet intervals, 
feature packets are not stored. It is estimated that about 50 percent of the samples will be 
quiet. The average storage requirement is therefore about 144 kilobytes per day or 
. approximately 1 Megabyte per week. The portable monitoring units are capable of storing 
10 about one month of compressed samples. 

If the portable monitoring units are interrogated daily, approximately one minute 
will be required to transfer the most recent samples to a central computer or collection 
site. The number of modems 9 required at the central computer 7 or collection site 33 
depends on the number of portable monitoring units 4. 

15 In a single market or a relatively small region, a central computer 7 receives 

broadcast signals directly and stores feature data continuously on its local disk 8. 
Assuming that on average a market will have 10 TV stations and 50 radio stations, the 
required storage is about 173 Megabytes per day or 1210 Megabytes per week. Data older 
than one week is deleted. Obviously, as more sources are acquired through, e.g., satellite 

20 network feeds and cable television, the storage requirements increase. However, even with 
500 broadcast sources the system needs only 10 Gigabytes of storage for a week of 
continuous storage. 
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The recognition process requires that the central computer 7 locate time intervals 
in the stored feature blocks that are time aligned (within a few seconds) with the unknown 
feature packet. Since each portable monitoring unit 4 produces one packet per minute, the 
processing load with 500 broadcast sources is 500 pattern matches per minute or about 8 
5 matches per second for each portable monitoring unit. Assmning that there are 500 
portable monitoring units in a market the system must perform about 4000 matches per 
second. 

When deployed on a large scale in many markets the overall system architecture is 
somewhat different as is illustrated in Figure 6. There are separate remote signal 

10 collection computers 33 installed in each city or market. The remote computers 33 record 
the broadcast sources in their particular markets as described above. In addition, they 
interrogate the portable monitoring units 34 in their area by modem 32 and download the 
collected feature packets. The signal collection computers 33 are connected to a central 
site by a wide area data communication network 35. The central computer site consists of 

15 a network 37 of computers 39 that can share the pattern recognition processing load. The 
local network 37 is connected to the wide area network 35 to allow the central site 
computers 39 to access the collected feature packets and broadcast feature data blocks. In 
operation, a central computer 39 downloads a day's worth of feature packets from a 
portable monitoring unit 34 using modem 32. Broadcast time segments that correspond to 

20 the packet times are then identified and transferred to the central site. The identification is 
then performed at the central site. Once an initial identification has been made, it is 
confirmed by matching subsequent packets with broadcast source features from the same 
channel as the previous recognition. This reduces the amount of data that must be 
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transferred from the remote collection computer to the central site. This is based on the 
assumption that a listener will continue to listen (or stay tuned) to the same station for 
some amount of time. When a subsequent match fails, the remaining channels are 
downloaded for pattern recognition. This continues until a new match has been found. The 
5 system then reverts to the single-channel tracking mode. 

The above process is repeated for all portable monitoring units 34 in all markets. 
In instances where markets overlap, feature packets from a particular portable unit can be 
compared with data from each market. This is accomplished by downloading the 
appropriate channel data from each market. In addition, signals that are available over a 
10 broad area such as satellite feeds, direct satellite broadcasts, etc. are collected directly at 
the central site using one or more satellite receivers 36. This includes many sources that 
are distributed over cable networks such as movie channels and other premium services. 
This reduces the number of sources that must be collected remotely (and redundantly) by 
the signal collection computers. 

15 Ah additional capability of this system configuration is the ability to match 

broadcast sources in different markets. This is useful where network affiliates may have 
several different selections of programming. 

In the preferred embodiment of the portable monitoring unit shown in Figure 2 the 
audio signal received by small microphone 1 1 in a portable unit is amplified, lowpass 
20 filtered, and digitized by an analog to digital converter 13. The sample rate is 8 
kilosamples per second, resulting in a Nyquist frequency of 4 kHz. To avoid alias 
distortion, an analog lowpass filter 12 rejects frequencies greater than about 3.2 kHz. The 
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analog to digital converter 13 sends the audio samples to a digital signal processing 
microprocessor 17 that performs the audio processing and feature extraction. The first step 
in this processing is spectrum analysis and partitioning of the audio spectrum into three 
frequency bands as shown in Figure 3. 

5 The frequency bands have been selected to contain approximately equal power on 

average. In one embodiment, the frequency bands are: 

Bandl: 50Hz— 500Hz 

Band 2: 500 Hz— 1500 Hz 

Band 3 1500Hz— 3250Hz 

10 It will be understood by those skilled in the art that other frequency bands may be 

used to implement the teachings of the present invention. 

The spectrum analysis is performed by periodically performing Fast Fourier 
Transforms (FFT's) on blocks of 64 samples. This produces spectra containing 32 
frequency "bins". The power in each bin is found by squaring its magnitude. The power in 

1 5 each band is then computed as the sum of the power in the corresponding bins frequency. 
A magnitude value is then computed for each band by taking the square root of the 
integrated power. The mean value of each of these streams is then removed by using a 
recursive high-pass fdter. The data rate and bandwidth must then be reduced. This is 
accomplished using polyphase decimating lowpass filters. Two filter stages are employed 

20 for each of the three feature streams. Each of these filters reduces the sample rate by a 
factor of five, resulting in a sample rate of 10 samples per second (per stream) and a 

16 
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bandwidth of about 4 Hz. These are the audio data measurements that are used as features 
in the pattern recognition process. 

A similar process is performed at the central computer site as shown in Figure 4. 
However, audio signals are obtained from direct connections to, radio and television 
5 broadcast receivers. Since many audio sources must be collected simultaneously, a set of 
preamplifiers and analog lowpass filters 20 is included. The outputs of these filters are 
connected to a channel multiplexer 21 that switches sequentially between each audio 
signal and sends samples of these signals to the analog to digital converter 22. A digital 
signal processor 23 then operates on all of the audio time series waveforms to extract the 
10 features. 

To reduce the storage requirements in both the portable units and the central 
computers, the system employs mu-law compression of the feature data. This reduces the 
data by a factor of two, compressing a 16-bit linear value to an eight bit logarithmic value. 
This maintains the full dynamic range while retaining adequate resolution for accurate 
15 correlation performance. The same feature processing is used in both the portable 
monitoring units and the central computers. However, the portable monitoring units 
capture brief segments of 64 feature samples at intervals of approximately one minute as 
triggered by a timer in the portable monitoring unit. Central computers record continuous 
streams of feature data. 



20 The portable monitoring unit is based on a low-power digital signal processor of 

the type that is frequently used hi such applications as audio processing for digital, cellular 
telephones. Most of the time this processor is in an idle or sleep condition to conserve 
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battery power. However, an electronic timer operates continuously and activates the DSP 
at intervals of approximately one minute. The DSP 17 collects about six seconds of audio 
from the analog to digital converter 13 and extracts audio features from the three 
frequency bands as described previously. The value of the timer 15 is also read for use in 
5 time marking the collected signals. The portable monitoring unit also includes a 
rechargeable battery 19 and a docking station data interface 18. 

La addition to the features that are collected, the total audio power present in the 
six-second block is computed to determine if an audio signal is present. The audio signal 
power is then compared with an activation threshold. If the power is less than the 
10 threshold the collected data are discarded, and the DSP 1 7 returns to the inactive state 
until the next sampling interval. This avoids the need to store data blocks that are 
collected while the user is asleep or in a quiet environment. If the audio power is greater 
than the threshold, then the data block is stored in a non- volatile memory 16. 

Feature data to be stored are organized as 64 samples of each of the three feature 
15 streams. These data are first mu-law compressed from 16 bit linear samples to 8 bit 
logarithmic samples. The resulting data packets therefore contain 192 data bytes. The 
data packets also contain a four-byte unit identification code and a four-byte timer value 
for a total of 200 bytes per packet. The data packets are stored in a non-volatile flash 
memory 16 so that they will be retained when power is not applied. After storing the data 
20 packet, the unit returns to the sleep-state until the next sampling interval. This procedure 
is illustrated in flow-chart forhi in Figure 7. 

Figure 5 is a block diagram of the portable unit docking station 10. The docking 
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station includes a data interface 28 to the portable unit 4 and a dialup modem 29 that is 
used to communicate with modems 9 that are connected to the central computer 7. An AC 
power supply 31 supplies power to the docking station and also powers a battery charger 
30 that is used to recharge the battery 19 in the portable monitoring unit 4. 

When the portable monitoring unit 4 is in its docking station 10 and communicates 
with a central computer 7, packets are transferred in reverse order. That is, the newest data 
packets are transferred first, proceeding backwards in time. The central computer 
continues to transfer packets until it encounters a packet that has been previously 
transferred. 

Each portable monitoring unit 4 optionally includes a motion detector or sensor 
(not shown) that detects whether or not the device is actually been worn or carried by the 
user. Data indicating movement of the device is then stored (for later downloading and 
analysis) along with the audio feature information described above. In one embodiment, 
audio feature information is discarded or ignored in the survey process if the output of the 
motion detector indicated that the device 4 was not actually been worn or carried during a 
significant period of time when the audio information was being recorded. 

Each portable monitoring unit 4 also optionally includes a receiver (not shown) 
used for determining the position of the unit (e.g., a GPS receiver, a cellular telephone 
receiver, etc.). Data indicating position of the device is then stored (for later downloading 
and analysis) along with the audio feature information described above. In one 
embodiment, the downloaded position information is used by the central computer to 
determine which signal collection station's features to access for comparison. 
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In contrast with the portable monitoring units that sample the audio environment 
periodically, the central computer must operate continuously, storing feature data blocks 
from many audio sources. The central computer then compares feature packets that have 
been downloaded from the portable units with sections of audio files that occurred at the 
5 same date and time. There are three separate processes operating in the data collection 
and storage aspect of central computer operation. The first of these is the collection and 
storage of digitized audio data and storage on the disks 8 of the central computer. The 
second task is the extraction of feature data and the storage of time-tagged blocks of 
feature data on the disk. The third task is the automatic deletion of feature files that are old 
1 0 enough that they can be considered to be irrelevant (one week) . These processes are 
illustrated in Figure 8. 

Audio signals may be received from any of a number of sources including 
broadcast radio and television, satellite distribution systems, subscription services, and the 
internet. Digitized audio signals are stored for a relatively short time (along with time 
1 5 markers) on the central computer pending processing to extract the audio features. It is 
frequently beneficial to directly compute the features in real-time using special purpose 
DSP boards that combine analog to digital conversion with feature extraction. In this case 
the temporary storage of raw audio is greatly reduced. 

The audio feature blocks are computed in the same manner as for the portable 
20 monitoring units. The central computer system 7 selects a block of audio data from a 

particular channel or source- and performs a spectrum analysis. It then integrates thepower 
in each of three frequency bands and outputs a measurement. Sequences of these 
measurements are lowpass filtered and decimated to produce a feature sample rate of 10 
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samples per second for each of the three bands. Mu-law compression is used to produce 
logarithmic amplitude measurements of one byte each, reducing the storage requirements. 
Feature samples are gathered into blocks, labeled with their source and time, and stored on 
the disk. This process is repeated for all available data blocks from all channels. The 
5 system then waits for more audio data to become available. 

In order to control the requirement for disk file storage, feature files are labeled 
with their date and time of initiation. For example, a file name may be automatically 
constructed that contains the day of the week and hour of the day. An independent task 
then scans the feature storage areas and deletes files that are older than a specified 
10 amount. While the system expects to interrogate portable monitoring units on a daily basis 
and to compare their collected features with the data base every day, there will be cases 
where it will not be possible to interrogate some of the portable units for several days. 
Therefore, feature data are retained at the central computer site for about a week. After 
that, the results will no longer be useful. 

1 5 When the central computer 7 compares audio feature blocks stored on its own disk 

drive 8 with those from a portable monitoring unit 4, it must match its time markers with 
those transferred from the portable monitoring unit. This reduces the amount of searching 
that must be done, improving the speed and accuracy of the processing. 

Each portable monitoring unit 4 contains its own internal clock 1 5. To avoid the 
20 need to set this clock or maintain any specific calibration, a simple 32-bit counter is used 
that is incremented at a 10 Hz rate. This 10 Hz signal is derived from an accurate crystal 
oscillator. In fact, the absolute accuracy of this oscillator is not very important. What is 
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important is the stability of the oscillator. The central site interrogates each portable 
monitoring unit at intervals of between one day and once per week. As part of this 
procedure the central site reads the current value of the counter in the portable monitoring 
unit. It will also note its own time count and store both values. To synchronize time the 
5 system subtracts the time count that was read from the portable unit during the previous 
interrogation from the current value. Similarly, the system computes the number of counts 
that occurred at the central site (the official time) by subtracting its stored counter value 
from the current counter value. If the frequencies are the same, the same number of counts 
will have transpired over the same time interval (6.048 Million counts per week). In this 
10 case the portable unit 4 can be synchronized to the central computer 7 by adding the 

difference between the starting counts to the time markers that identify each audio feature 
measurement packet. This is the simplest case. 

The typical case is where the oscillators are rumiing at slightly different 
frequencies. It is still necessary to align the starting counter values, but the system must 

1 5 also compute a scale factor and apply it to time markers received from the portable 

monitoring unit. This scale factor is computed by dividing the number of counts from the 
central computer by the number of counts from the portable unit that occurred over the 
same time interval. The first order (linear) time synchronization requires computation of 
an offset and a scale factor to be applied to the time marks from the portable monitoring 

20 unit. 

Compute Offset Off-=S c ^S p 
Compute Central Counts C C = E C -S C 
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Compute Portable Counts C P =E P -S P 
Compute Scale Factor Scl = C C /C P 

Time markers can then be converted from the portable monitoring unit to the 
central computer frame of reference : 

5 Convert Time Marker T c = (T p + Off) * Scl 

The remaining concern is short-term drift of the oscillator in the portable 
monitoring unit. This is primarily due to temperature changes. The goal is to stay within 
one second of the linearly interpolated time. The worst timing errors occur when the 
frequency deviates hi one direction and then in the opposite direction. However, it has 
10 been determined that stability will be adequate over realistic temperature ranges. 

The audience survey system includes pattern recognition algorithms that determine 
which of many possible audio sources was captured by a particular portable monitoring 
unit 4 at a certain time. To accomplish this with reasonable hardware cost, the central 
computers 7 preferably employ high performance PC's 25 that have been augmented by 
1 5 digital signal processors 26 that have been optimized to perform functions such as 

correlations and vector operations. Figure 9 summarizes the signal recognition procedure. 

As discussed previously, it is important to synchronize the time markers received 
from the portable monitoring units 4 with the time tags applied to feature blocks stored on 
the central computer systems 7. .Once this has been done, the system should be able to find 
20 stored feature blocks that are within about one second from the feature packets received 
from the portable units. The tolerance for time alignment is about +1/- 3 seconds, leaving 
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some room to deal with unusual situations. Additionally, the system can search for pattern 
matches outside of the tolerance window, but this slows down the processing. In cases 
where pattern matches are not found for a particular portable unit, the central computer 
can repeat all of the pattern matches using an expanded search window. Then when 
5 matches are found, their times of occurrence can be used as checkpoints to update the 
timing information. However, the need to resort to these measures may indicate a 
malfunction of the portable monitoring unit or its exposure to environmental extremes. 

The pattern recognition process involves computing the degree of match with 
reference patterns derived from features of each of the sources. As shown in Figure 9, this 

10 degree of match is measured as a weighted Euclidean distance in three-dimensional space. 
The distance metric indicates a perfect match as a distance of zero. Small distances 
indicate a closer match than large distances. Therefore, the system must find the source 
that produces the smallest distance to the unknown feature packet. This distance is then 
compared with a threshold value. If the distance is below the threshold; the system will 

15 report that the unknown packet matches the corresponding source and record the source 
identification. If the minimum distance is greater than the threshold, the system presumes 
that the unknown feature packet does not match any of the sources and record that the 
source is unknown. 

The basic pattern matching procedure is illustrated in Figure 10. Feature packets 
20 from a portable monitoring unit 4 contain 64 samples from each of the three bands. These 
must first be mu : law decompressed to produce 16 bit linear values. Eachf of the three 
feature waveforms is then normalized by dividing each value by the standard deviation 
(square root of power) computed over the three signals. This corrects for the audio volume 
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to which the portable unit was exposed when the feature packet was collected. Each of the 
three normalized waveforms is then padded with a block of zeroes to a total length of 128 
samples per feature band. This is necessary to take advantage of a fast correlation 
algorithm based on the FFT. 

5 The system then locates a block of samples consisting of 128 samples of each 

feature as determined by the time ahgnment calculation. Tins will include the time offset 
needed to assure that the needed three second margins are present at the beginning and 
end of the expected location of the unknown packet. Next, the system calculates the cross- 
correlation functions between each of the three waveforms of the unknown feature packet 

1 0 and the corresponding source waveforms. In the fast correlation algorithm this requires 
that both the unknown and the reference source waveforms arc transformed to the 
frequency domain using a fast Fourier transform. The system then performs a conjugate 
vector cross-product of the resulting complex spectra and then performs an inverse fast 
Fourier transform on the result. The resulting correlation functions are then normalized by 

15 the sliding standard deviation of each computed over a. 64 sample window. 

Each of the three correlation functions representing the three frequency bands have 
a maximum value of one for a perfect match to zero for no correlation to minus one for an 
exact opposite. Each of the correlation values is converted to a distance component by 
subtracting it from one. The Euclidean distance is preferably defined as set forth in 
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equation (1) below as the square root of the. sum. of the squares of the individual 
components: 

D=[(l-cvj) 2 + (1-cvJ 2 ] + (l-cvsff (1) 

This results in a single number that measures how well a feature packet matches the 
reference (or source) pattern, combining the individual distances as though they were 
based on measurements taken in three dimensional space. However, by virtue of 
normalizing the feature waveforms, each component makes an equal contribution to the 
overall distance regardless of the relative amplitudes of the audio in the three bands. In 
one embodiment, the present invention aims to avoid situations where background noise 
in an otherwise quiet band disturbs the contributions of frequency bands containing useful 
signal energy. Therefore, the system reintroduces relative amplitude information to the 
distance calculation by weighting each component by the standard deviations computed 
from the reference pattern as shown in equation (2) below. This must be normalized by 
the total magnitude of the signal: 

D w =[((stdj) *(l-cv } )) 2 +((std 2 ) 2 *(l cv 2 )) 2 +((std 3 ) ^l-cv^V^listd^^std^^stdiff (2) 

The sequence of operations can be rearranged to combine some steps and eliminate others. 
The resulting weighted Euclidean distance automatically adapts to the relative amplitudes 
of the frequency bands and will tend to reduce the effects of broadband noise that is 
present at the portable unit and not at the source. 

A variation of the weighted Euclidean distance involves integrating or averaging 
successive distances calculated from a sequence of feature packets received from a 
portable unit as shown in Figure 1 1 . In this procedure, the weighted distance is computed 
26 



WO 02/13396 



PCT/US01/24338 



as above for the first packet. A second packet is then obtained and precisely aligned with 
feature blocks from the same source in the central computer. Again, the weighted 
Euclidean distance is calculated. If the two packets are from the same source, the 
minimum distance will occur at the same relative time delay in the distance calculation. 
5 For each of the 64 time delays in the distance array for a particular source the system 
computes a recursive update of the distance where the averaged distance is decayed 
slightly by multiplying it by a coefficient k that is less than one. The newly calculated 
distance is then scaled by multiplying it by (1-k) and adding it to the average distance. 
For a particular time delay value within the distance array the update procedure can be 
1 0 expressed as shown in equation (3) below: 

D w (n)=k*D„(n-l) + (l-k)*Dw(n) (3) 

Note that the bold notation Dw indicates the averaged value of the distance calculation, 
(n) refers to the current update cycle, and (n-1) refers to the previous update cycle. This 
process is repeated on subsequent blocks, recursively integrating more signal energy. The 
15 result of this is an improved signal-to-noise ratio in the distance calculation that reduces 
the probability of false detection. 

The decision rule for this process is the same as for the un-averaged case. The 
minimum 10 averaged distance from all sources is first found. This is compared with a 
distance threshold. If the minimum distance is less than the threshold, a detection has 
20 occurred and the source identification is recorded. Otherwise the system reports that the 
source is unknown. 

Referring now to Figure 12, there is shown a flow diagram illustration how 
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program source information is compiled and used to generate ratings reports (such as that 
shown in Fig. 13), in accordance with a preferred embodiment of the present invention. In 
step 1211, a random sample of candidate panelists is selected from a given population. 
The given population from winch the random sample is drawn corresponds, for example, 
to viewers/listeners in a given electronic media market. In step 1212, a respondent panel is 
recruited from the candidate -panelists selected in step 121 1. For example, in step 1212, 
individual candidate panelists are contacted and asked to participate in a survey by 
wearing/carrying the portable monitoring units described herein. The candidate panelists 
that agree to participate in the survey are hereinafter referred to as "recruited 
respondents." In step 1213, the demographic characteristics (e.g., gender, age, income 
level, etc.) of each recruited respondent are recorded. In step 1214, a weight is associated 
with each recruited respondent based on the demographic characteristics associated with 
the particular recruited respondent. Techniques for determining the weight to associate 
with each respondent are well known in the survey art. Such weights typically aim to 
adjust the importance given to each member of a group of recruited respondents, based on 
the particular demographic characteristics of each such member, so that the group of 
recruited respondents as a whole more closely resembles (from a demographic view) the 
demographics of the overall population from which the recruited respondents were 
selected. Next, hi steps 1221, 1222, 1223 and 1224, the program source selected by each 
user of a portable monitoring unit (i.e., each recruited respondent) during each of a 
plurality of different time periods is then identified in accordance with the methods 
described above, in order to detect matches between the information recorded by the 
portable monitoring units and information recorded by a central broadcast collection 
facility (e.g., central computer 7). In step 1225, a record of each recruited respondent' s 
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viewing/listening is recorded (preferably on computer disk) based on the detected matches 
described above. In step 1226, the matches recorded for each recruited respondent are 
adjusted based on the weights assigned in step 1214. Based on these weight-adjusted 
values, a report of estimates of audience size and characteristics for a particular electronic 
5 media market is generated in step 1227. An example of such a report is shown in Figure 
13. Reports generated in accordance with the present invention are then typically provided 
to subscribers such as -electronic media advertisers and broadcasters, in exchange for a 
subscription fee. 

It will be understood by those skilled in the art that reports having 
1 0 formats/parameters other than those shown in Figure 1 3 may be generated in accordance 
with the teachings of the present invention, and that the report in Figure 13 is provided by 
way of example, and should not be deemed as limiting to the scope of the present 
invention. It will also be understood by those in the art that, in an alternative embodiment, 
reports may optionally be generated in accordance with the present invention without use 
15 of the demographic weights discussed above in connection with Figure 12. 

The previous description of the preferred embodiments is provided to enable any 
person skilled in the art to make and use the present invention. The various modifications 
to these embodiments will be readily apparent to those skilled in the art, and the generic 
principles defined herein may be applied to other embodiments without the use of the 
20 inventive faculty. Thus, the present invention is not intended to be limited to the 

embodiments shown herein but is to be accorded the widest scope consistent with the 
principles and novel features disclosed herein. 
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What is claimed is: 

1. An audience survey system, comprising: 

(A) a plurality of portable monitoring units that are assigned to users 
that are members of an audience panel, wherein each portable monitoring unit records 

5 information representative of progranmiing content of free field audio signals received by 
the portable monitoring unit; 

(B) a central broadcast collection facility that records hiformation 
representative of audio signals transmitted from a plurality of program sources^ wherein 
for each audio signal the information recorded by the central broadcast collection facility 

10 includes information representing programming content of the audio signal; and 

(C) a computer that compiles program source information by 
identifying a program source selected by each user of a portable monitoring unit during 
each of a plurality of different time periods in accordance with a match between the 
information recorded by the portable monitoring units and the information recorded by the 

1 5 central broadcast collection facility; 

(D) wherein the computer generates a report of estimated audience size 
and characteristics based upon the observed behavior of the audience panel in accordance 
with the compiled source information. 

2. The system of claim 1, wherein each portable monitoring unit periodically 
20. records the information representative of the programming content of the free field audio 

signals received by the portable monitoring unit. 
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3. The system of claim 2, wherein the central broadcast collection facility 
continuously records the information representative of audio signals broadcast from the 
plurality of program sources. 

4. The system of claim 3, wherein the computer is coupled to the central 
5 broadcast collection facility, said system further comprising: 

(E) a plurality of docking stations each of which periodically downloads the 
information recorded by a portable monitoring unit to the computer. 

5 . The system of claim 4, wherein each of the docking stations includes a 
mpdem for communicating with the computer, and a charger that charges a battery in a 

10 portable momtoiing unit when the portable monitoring unit is positioned in the docking 
station. 

6. The system of claim 1 , wherein each portable monitoring unit includes a 
microphone that receives free field audio signals associated with a programming source 
selected by the user of the portable monitoring unit. 

15 7 . The system of claim 1 , wherein each portable monitoring unit is worn or 

carried by a user. 

8 . The system of claim 1 , wherein the information representative of the 
programming content of the free field audio signals recorded by each portable monitoring 
unit includes a digitally compressed version of programming content associated with free 

20 field audio signals received by the portable monitoring unit. 

9. The system of claim 8, wherein the information recorded by the central 
broadcast collection facility includes a digitally compressed version of programming 
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content associated with the audio signals received by the central broadcast collection 
facility. 

1 0. The method of claim 1 , wherein the central broadcast collection facility 
and the computer that identifies the program source selected by each user of a portable 
monitoring unit during each of a plurality of different time periods are implemented using 
a common host computer. 

11. A method for generating a report of estimated audience size and 
characteristics and providing the report to subscribers of the report, comprising the steps 
of: 

(A) providing a plurality of portable monitoring units to users that are members 
of an. audience panel, wherein each portable monitoring unit records information 
representative of programming content of free field audio signals received by the portable 
monitoring unit; 

(B) recording, at a central broadcast collection facility, information 
representative of audio signals broadcast from a plurality of program sources, wherein for 
each audio signal the information recorded by the central broadcast collection facility 
includes information representing programming content of the audio signal; and 

(C) compiling program source information by identifying the program source 
selected by each user of a portable monitoring unit during each of a plurality of different 
time periods in accordance with a match between the information recorded by the portable 
monitoring units and the information recorded by the central broadcast collection facility; 
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(D) generating a report of estimates of audience size and characteristics based 
on the observed behavior of the audience panel and in accordance with the compiled 
source information; and 

(E) providing the report of estimates of audience size and characteristics to 
subscribers in exchange for payment of a subscription fee. 
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